Lead Software Engineer (AI Ops and Resilience)

Job Location
Singapore, Asia Pacific
Job Role
Engineer
Contract Type
Full-Time
Salary
Posted Date
2025-11-18
Job Expiry Date
2025-12-18
Qualification
Bachelor’s Degree

Job Description


We are looking for a technically strong and innovative Lead Software Engineer to spearhead the evolution of our IT operations through intelligent automation and hands-on engineering. 

This role is pivotal in enhancing operational resilience, streamlining IT service management processes, and developing the next generation of self-healing, predictive IT operations in close collaboration with diverse IT teams.

 

Key Responsibilities 


  • Reimagine and enhance core ITSM practices (Incident, Problem, Change, and Knowledge Management) using modern development frameworks and automation tools.
  • Design, prototype, and implement AI-driven operational tools, including predictive incident detection, automated remediation workflows, intelligent alerting, and large language model (LLM)-based knowledge agents.
  • Lead the development and deployment of custom automation solutions to improve IT service reliability and reduce manual workload across ITSM domains.
  • Collaborate with platform teams, enterprise architects, and developers to conceptualize and build next-generation IT operational capabilities.
  • Provide mentorship and guidance to ITSM IPC (Incident, Problem, Change and DR management) Engineers, ensuring effective execution and governance of ITSM processes aligned with ITIL best practices.
  • Drive adoption and continuous improvement of ITSM best practices across all IT teams.


Oversee operational aspects of the IT Command Centre and Helpdesk, including:


•  Acting as the primary liaison between internal stakeholders and external service providers.

•  Monitoring and managing performance of vendor-managed services to ensure SLA and KPI compliance.

•  Participating in service reviews, audits, and performance assessments.

•  Managing Incident, Problem, and Change Management processes across vendor operations.

•  Leading continuous improvement initiatives and service enhancements.

•  Supporting escalation management and root cause analysis efforts.

 

Requirements



•  Bachelor’s Degree in Computer Science, Engineering, or a related field (or equivalent experience).

•  5+ years of experience in IT operations or substantial exposure to ITSM processes and tooling.

•  Strong understanding of ITIL framework and ITSM best practices; ITIL v3/v4 certification is preferred.

•  Hands-on experience with automation tools, scripting, and AI/ML technologies relevant to IT operations.

•  Proficient with ITSM platforms such as ServiceNow, BMC Remedy, or similar tools.

•  Demonstrated ability to mentor technical teams and lead cross-functional collaboration.

•  Excellent problem-solving, communication, and stakeholder management skills.

•  Hands-on software development or scripting experience in Python, JavaScript (Node.js), or similar languages.

•  Experience with monitoring and observability platforms like Splunk, Grafana, ScienceLogic, or equivalent (advantageous).

•  Familiarity with CI/CD pipelines, GitOps practices, cloud platforms (AWS, Azure, GCP), and Infrastructure-as-Code (IaC) tools (advantageous).

•  Proficiency with AI/ML frameworks and tools (e.g., TensorFlow, scikit-learn, LangChain, OpenAI APIs) is a strong advantage.

•  A passion for innovation and continuous improvement.


Apply Now